Japanese Sentence Order Estimation using Supervised Machine Learning with Rich Linguistic Clues
نویسندگان
چکیده
Estimation of sentence order (sometimes referred to as sentence ordering) is one of the problems that arise in sentence generation and sentence correction. When generating a text that consists of multiple sentences, it is necessary to arrange the sentences in an appropriate order so that the text can be understood easily. In this study, we proposed a new method using supervised machine learning with rich linguistic clues for Japanese sentence order estimation. As one of rich linguistic clues we used concepts on old information and new information. In Japanese, we can detect phrases containing old/new information by using Japanese topicmarking postpositional particles. In the experiments of sentence order estimation, the accuracies of our proposed method (0.72 to 0.77) were higher than those of the probabilistic method based on an existing method (0.58 to 0.61). We examined features using experiments and clarified which feature was important for sentence order estimation. We found that the feature using concepts on old information and new information was the most important.
منابع مشابه
MT Quality Estimation for E-Commerce Data
In this paper we present a system that automatically estimates the quality of machine translated segments of e-commerce data without relying on reference translations. Such approach can be used to estimate the quality of machine translated text in scenarios in which references are not available. Quality estimation (QE) can be applied to select translations to be postedited, choose the best tran...
متن کاملمقایسه روشهای مختلف یادگیری ماشین در خلاصهسازی استخراجی گفتار به گفتار فارسی بدون استفاده از رونوشت
In this paper, extractive speech summarization using different machine learning algorithms was investigated. The task of Speech summarization deals with extracting important and salient segments from speech in order to access, search, extract and browse speech files easier and in a less costly manner. In this paper, a new method for speech summarization without using automatic speech recognitio...
متن کاملA Readable Read: Automatic Assessment of Language Learning Materials based on Linguistic Complexity
Corpora and web texts can become a rich language learning resource if we have a means of assessing whether they are linguistically appropriate for learners at a given proficiency level. In this paper, we aim at addressing this issue by presenting the first approach for predicting linguistic complexity for Swedish second language learning material on a 5-point scale. After showing that the tradi...
متن کاملSentence Subjectivity Detection with Weakly-Supervised Learning
This paper presents a hierarchical Bayesian model based on latent Dirichlet allocation (LDA), called subjLDA, for sentence-level subjectivity detection, which automatically identifies whether a given sentence expresses opinion or states facts. In contrast to most of the existing methods relying on either labelled corpora for classifier training or linguistic pattern extraction for subjectivity ...
متن کاملRecent Advances in Example - Based Machine Translation
This book, an outcome of a 2001 workshop on Example-Based Machine Translation (EBMT) in Santiago de Compostela, very appropriately starts with a preface by professor Makoto Nagao in which he explains how the limits of rule-based Machine Translation (MT) led him to propose his translation by analogy principle in 1981 (published as Nagao, 1984). His idea, inspired by second language learning meth...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013